dataset size
- North America > United States > Maryland (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > California (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology (0.92)
- Law (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
A Appendix A.1 UniBench Implementation Details We have developed UniBench
To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. Following Matsuura et al. [2023] methodology, we evaluated Llava 1.5 [Liu et al., 2023] - a LLM-style VLM - on various benchmark types in UniBench (Table 2). Scaling improves many benchmarks, but offers little benefit for reasoning and relation. Figure 8: Benchmark capabilities performance does not scale with dataset and model size Median zero-shot performance of models on various benchmark capabilities.
UniBench: VisualReasoningRequiresRethinking Vision-LanguageBeyondScaling
Wefind that while scaling training data ormodel size can boost many vision-language model capabilities, scaling offers little benefit for reasoning or relations. Surprisingly, we also discover today's best VLMs struggle on simple digit recognition and counting tasks, e.g. MNIST, which much simpler networks can solve.
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Research Report > New Finding (0.69)
- Research Report > Experimental Study (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province (0.04)